Septic shock, or sepsis, is a severe, often deadly disease that affects both developing and developed countries, arising from untreated infection. Sepsis can occur in healthcare settings and is one of the most common adverse events for patients in hospitals (1). Unfortunately, sepsis can be difficult to treat due to antibiotic resistance, as well as different response in patients to treatment. To further explore the response of individuals to sepsis treatment, 31 patients treated for sepsis underwent whole blood RNA sequencing. Of these 31 patients, 17 responded to treatment (R), and 14 did not respond (NR). The sequencing was performed at two time points: upon ICU admission (T1), and 48h after ICU admission (T2). After being admitted to the ICU, the patients received hemodynamic therapy as treatment for sepsis (2).
This dataset is of interest as discovering the underlying expression differences in Responders versus Non-Responders can lead to improved treatment and care options for sepsis patients in the future. Different approaches can be used for Non-Responders to hemodynamic therapy, and thus improve patient outcomes.
The test conditions for the dataset are whether the patients are Responders or Non-Responders. The control conditions include the sepsis treatment received by the patients, the time at which sequencing was performed, and physiological conditions of the patients (i.e. concentration or arterial lactate). As well, patients with certain illnesses (i.e. metastatic cancers) and terminal conditions were excluded.
For the experimental groups R and NR, there were 17 and 14 biological replicates respectively at each time point. The biological replicates come from different patients. There are also no technical replicates in this dataset. The replicates in the two time points come from the same patients, thus each of the replicates are paired.
Set up and download dataset:
suppressWarnings({if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
if (!requireNamespace("GEOmetadb", quietly = TRUE))
BiocManager::install("GEOmetadb")
if (!requireNamespace("limma", quietly = TRUE))
BiocManager::install("limma")
if (!requireNamespace("AnnotationDbi", quietly = TRUE))
BiocManager::install("AnnotationDbi")
if (!requireNamespace("org.Hs.eg.db", quietly = TRUE))
BiocManager::install("org.Hs.eg.db")
if (!requireNamespace("edgeR", quietly = TRUE))
BiocManager::install("edgeR")
if (!requireNamespace("readxl", quietly = TRUE))
BiocManager::install("readxl")
if (!requireNamespace("RColorBrewer", quietly = TRUE))
install.packages("RColorBrewer")
if (!requireNamespace("tidyverse", quietly = TRUE))
install.packages("tidyverse")
if (!requireNamespace("ggplot2", quietly = TRUE))
install.packages("ggplot2")
if (!requireNamespace("dplyr", quietly = TRUE))
install.packages("dplyr")})
suppressPackageStartupMessages({
library(tidyverse)
library(edgeR)
library(org.Hs.eg.db)
library(GEOmetadb)
library(RColorBrewer)
library(ggplot2)
library(readxl)
library(dplyr)
library(AnnotationDbi)
library(limma)
})
ds <- "GSE110487"
# Download data
if (!file.exists("/Users/cindyfang/Documents/UofT/FourthYear/BCB420/XinZhi_Fang/A1/GSE110487/GSE110487_rawcounts.xlsx")){sfiles = getGEOSuppFiles(ds)}
sfiles <- rownames(sfiles)[[1]]
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'x' in selecting a method for function 'rownames': object 'sfiles' not found
Map gene ids to HUGO gene symbols and filter:
keep <- data[,2:63] %>%
cpm() %>%
rowSums(.>1) >= 62
dataFilt <- data[keep,]
ens2symbol <- AnnotationDbi::select(org.Hs.eg.db,
key=dataFilt$Geneid,
columns="SYMBOL",
keytype="ENSEMBL")
'select()' returned 1:many mapping between keys and columns
ens2symbol <- as_tibble(ens2symbol)
resFilt <- inner_join(dataFilt, ens2symbol, by=c("Geneid"="ENSEMBL"))
resFilt <- as_tibble(resFilt)
sum(is.na(resFilt$SYMBOL))/nrow(resFilt) # proportion of genes that could not be mapped
[1] 0.05555993
Based on the output of the above chunk, we can see that around 5.5% of the filtered genes could not be mapped to HGNC symbols. Unfortunately, the original dataset did not come with HGNC symbols or any other information for the genes. The unmapped genes have been retained for the downstream analyses of this assignment, but may need to be dropped later on for GSEA as genes in pathways are usually named using HGNC symbols.
Create the groups matrix based on response for normalization later on, and clean the data:
There were 11 warnings (use warnings() to see them)
finalDay1Data <- day1Data
The original dataset was split into two sub-datasets, time point 1 and time point 2. This allows for the comparison of the R and NR groups both before and after treatment. A similar splitting was done by the authors of the study. ## Exploratory data analysis
# create dataframes for plotting
day1bp <- stack(as.data.frame(log2(cpm(day1Data[,2:32]))))
day1bp <- merge(day1bp, expGroups1, by.x="ind", by.y="name")
day2bp <- stack(as.data.frame(log2(cpm(day2Data[,3:32]))))
day2bp <- merge(day2bp, expGroups2, by.x="ind", by.y="name")
# Plot timepoint 1 and timepoint 2 boxplots of log2 cpm
day1box<- ggplot(day1bp, aes(x=ind, y=values, fill=response))+
geom_boxplot()+
xlab("Sample")+
ylab("Log2 CPM")+
ggtitle("RNAseq for Patients at Time Point 1")+
theme(axis.text.x = element_text(angle = 90))
day2box <- ggplot(day2bp, aes(x=ind, y=values, fill=response))+
geom_boxplot()+
xlab("Sample")+
ylab("Log2 CPM")+
ggtitle("RNAseq for Patients at Time Point 2")+
theme(axis.text.x = element_text(angle = 90))
day1box
day2box
Based on the above boxplots, tnere seem to be many outliers for each sample. As well, there were many infinite values that were automatically removed by ggplot. Outliers will be retained in the datasets as there is no indication that they are due to measurement error, so we cannot be sure that they don’t represent biological variation. However, the infinite values have to be dropped as they cannot be handled in downstream analyses. The boxplots also show that the median log2 CPM does not seem to differ much between the NR and R groups at both time points. However, this does not necessarily mean that there is no expression variation between the groups because specific genes may be expressed to different extents in the two groups, which would not be visible in the boxplots.
Density plots:
day1dens <- ggplot(data=day1bp, aes(x=values, group=patient, col=patient)) +
geom_density(adjust=1.5, alpha=.4)+
ylab("Smoothing density of log2 CPM")+
ggtitle("Smoothing density of log2 CPM from RNAseq of patients at timepoint 1")+
facet_wrap(~response)
day2dens <- ggplot(data=day2bp, aes(x=values, group=patient, col=patient)) +
geom_density(adjust=1.5, alpha=.4)+
ylab("Smoothing density of log2 CPM")+
ggtitle("Smoothing density of log2 CPM from RNAseq of patients at timepoint 2")+
facet_wrap(~response)
day1dens
day2dens
The above density plots show slight changes in log2 CPM between the two timepoints. It seems that for both R and NR groups, the density between values of 0 to 5 increased at time point 2. This suggests that both groups experienced changes in expression in between the two time points. However, it is not clear if this is a hallmark of the treatment they received or if it is due to the progression of sepsis.
Discard infinite values to create final datasets:
day1df <- day1Data[!is.infinite(rowSums(log2(cpm(finalDay1Data[,2:32])))),]
day2df <- day2Data[!is.infinite(rowSums(log2(cpm(finalDay2Data[,2:32])))),]
finalDay1Data <- as.data.frame(finalDay1Data)
finalDay2Data <- as.data.frame(finalDay2Data)
MA plots
For the MA plots, each of the datasets were combined into two separate groups, R and NR. This is because MA plots are for two separate samples. Due to the high number of replicates in this study, it would not be practical to make all pairwise MA plots. Thus, all R samples from a time point were grouped together and plotted against the NR samples from the same timepoint. Since the condition of interest is Responders vs Non-Responders, grouping the samples like this gives a general overview of the variation between the two groups.
In the plot of time point 2, it seems that the points show more variation in the y-axis (Expression log-ratio). This could indicate that after treatment, there is some variation in expression in Non-Responders versus Responders.
dnorm1 <- calcNormFactors(d1, method="TMM")
Error in calcNormFactors(d1, method = "TMM") : object 'd1' not found
In addition: There were 15 warnings (use warnings() to see them)
MDS plot:
cols <- brewer.pal(n = 3, name = "Dark2")
mds1 <- plotMDS(dnorm1, col=cols[factor(expGroups1$response)], main="MDS plot for time point 1")
mds2 <- plotMDS(dnorm2, col=cols[factor(expGroups2$response)], main="MDS plot for time point 2")
mds1
An object of class MDS
$dim.plot
[1] 1 2
$distance.matrix
E04T1 E07T1 E15T1 E16T1 E17T1 E18T1 E26T1 E32T1 E33T1
E04T1 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
E07T1 3.001755 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
E15T1 2.939183 2.542361 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
E16T1 2.760137 2.771299 2.581080 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
E17T1 3.312952 3.748560 3.631204 3.188006 0.000000 0.000000 0.000000 0.000000 0.000000
E18T1 3.221368 2.785242 3.047736 2.884886 3.301902 0.000000 0.000000 0.000000 0.000000
E26T1 3.464766 3.272465 3.075570 3.132207 3.472123 3.769658 0.000000 0.000000 0.000000
E32T1 3.202840 3.503655 3.607809 3.483402 2.563712 3.552253 3.233660 0.000000 0.000000
E33T1 2.962634 2.864579 2.492873 2.870962 3.445319 2.926386 3.178826 3.406030 0.000000
E40T1 2.726281 2.774014 2.866236 2.986933 3.638391 2.759948 3.482662 3.559399 2.566308
E41T1 2.623981 3.036964 3.076412 3.145816 2.959483 3.224365 3.524374 2.794654 2.889189
G01T1 3.036117 3.045550 3.014557 2.879794 3.470848 3.120552 3.098013 3.406670 2.933663
G07T1 2.914118 3.207742 3.288339 3.173142 3.362387 3.311806 3.499969 3.280470 3.230626
G08T1 3.043283 3.322523 3.416323 3.208529 3.481711 3.363580 3.685175 3.416645 3.371486
G11T1 3.060426 2.642137 2.861448 2.614180 2.985928 2.513985 2.932911 3.389876 2.930000
G12T1 2.940760 2.754100 3.035094 2.718814 3.260566 2.677916 3.476326 3.628476 3.101715
G23T1 3.049672 2.770299 2.109513 3.012144 3.710861 3.521175 3.031700 3.796390 2.605473
G28T1 3.050422 2.900401 3.052644 2.911510 3.004315 2.911789 3.511727 2.956344 2.988231
G29T1 3.537965 3.544230 3.058349 3.969288 4.255586 4.545805 3.334553 3.798665 3.465389
G33T1 3.291061 3.012490 3.073387 3.055486 3.151684 3.187225 3.413365 3.038718 3.230603
G34T1 2.917831 2.479970 2.450261 2.670093 3.417267 3.160776 2.890090 3.465038 2.676624
G35T1 2.819944 2.787518 3.125362 3.064758 3.383099 2.972325 3.466531 3.115866 3.224564
G39T1 2.800647 3.052342 2.808127 3.284333 3.409986 3.515944 2.748617 3.160208 2.935731
G41T1 3.624231 3.767127 3.590130 3.227871 2.794569 3.632085 3.158420 3.108836 3.679692
G42T1 2.990425 2.805422 2.787726 2.841463 3.283471 2.937743 2.959140 3.098815 2.914597
G43T1 2.960520 2.250212 2.714339 2.886551 3.593880 2.425641 3.328253 3.507101 2.768489
G44T1 2.967503 2.615732 2.743633 2.707142 3.484308 2.386349 3.163530 3.409546 2.824320
G45T1 3.037085 2.424304 2.624141 2.490642 2.950528 2.759086 2.698758 3.139340 2.898915
G47T1 2.833781 2.788845 2.820766 2.971572 3.373630 3.190875 3.280791 3.328013 3.051031
G49T1 2.816013 2.686640 2.502063 2.444967 3.667414 3.150397 2.896829 3.771843 2.741029
G51T1 3.499308 3.503838 3.455280 3.265214 3.016565 3.512380 2.941381 2.970235 3.305574
E40T1 E41T1 G01T1 G07T1 G08T1 G11T1 G12T1 G23T1 G28T1
E04T1 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
E07T1 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
E15T1 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
E16T1 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
E17T1 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
E18T1 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
E26T1 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
E32T1 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
E33T1 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
E40T1 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
E41T1 2.818780 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
G01T1 3.087719 2.871040 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
G07T1 3.350507 2.738830 2.831365 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
G08T1 3.339674 3.143377 3.555881 3.408046 0.000000 0.000000 0.000000 0.000000 0.000000
G11T1 3.071972 3.036883 2.742372 2.743427 3.273691 0.000000 0.000000 0.000000 0.000000
G12T1 2.910982 3.201961 3.019827 3.217737 3.389016 2.285102 0.000000 0.000000 0.000000
G23T1 3.095089 3.207167 3.162265 3.367894 3.625279 2.940417 2.978040 0.000000 0.000000
G28T1 2.894692 2.805321 2.899766 3.007592 3.436923 2.758167 3.026418 3.412305 0.000000
G29T1 3.696062 3.604447 3.829235 3.721188 4.145741 3.803955 4.048763 2.750785 4.230683
G33T1 3.321628 3.015420 3.139269 3.150609 3.459621 2.909468 3.397033 3.535572 2.141742
G34T1 3.009212 2.890676 2.851248 3.135700 3.296866 2.545807 2.609420 2.156969 3.086923
G35T1 3.016759 2.510506 3.115854 2.735202 2.800755 2.850666 3.081783 3.231359 2.719028
G39T1 3.241449 2.860256 3.158112 2.970246 3.334036 2.926738 3.146724 2.530992 3.393826
G41T1 4.051781 3.423734 3.436476 3.565030 3.998755 3.053204 3.409215 3.590179 2.945660
G42T1 3.156981 2.915339 2.880947 3.113640 3.311823 2.682175 2.911689 2.757211 2.521592
G43T1 2.499639 2.949515 2.922495 3.240928 3.289508 2.563232 2.630485 2.879201 2.831590
G44T1 2.650782 2.923480 2.683940 3.060847 3.336863 2.482365 2.810345 3.023013 2.519425
G45T1 2.906623 2.955080 2.981641 3.146410 3.075511 2.344592 2.877922 2.831052 2.459081
G47T1 3.049328 2.769503 3.405049 3.053833 3.035682 2.836271 3.057964 2.714538 3.230525
G49T1 2.778662 3.032706 2.881698 3.243494 3.423673 2.704323 2.917348 2.445294 3.078899
G51T1 3.446433 3.123870 3.094654 3.234711 3.998705 2.771194 3.347774 3.543504 2.613011
G29T1 G33T1 G34T1 G35T1 G39T1 G41T1 G42T1 G43T1 G44T1
E04T1 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
E07T1 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
E15T1 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
E16T1 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
E17T1 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
E18T1 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
E26T1 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
E32T1 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
E33T1 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
E40T1 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
E41T1 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
G01T1 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
G07T1 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
G08T1 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
G11T1 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
G12T1 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
G23T1 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
G28T1 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
G29T1 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
G33T1 4.091734 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
G34T1 3.158382 3.222743 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
G35T1 3.770182 2.837066 2.896758 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
G39T1 2.595368 3.374320 2.675043 2.900827 0.000000 0.000000 0.000000 0.000000 0.000000
G41T1 4.271506 2.686432 3.372199 3.562301 3.399244 0.000000 0.000000 0.000000 0.000000
G42T1 3.627193 2.618144 2.625868 2.821660 2.636313 2.880266 0.000000 0.000000 0.000000
G43T1 3.668705 3.129376 2.667163 2.824916 2.951729 3.747573 2.812884 0.000000 0.000000
G44T1 3.960886 2.757615 2.774645 2.668603 3.003197 3.310002 2.541408 2.375602 0.000000
G45T1 3.565900 2.574273 2.560851 2.635030 2.883284 3.069049 2.458517 2.408826 2.345732
G47T1 3.049567 3.361782 2.741778 2.685401 2.356288 3.816110 2.828711 2.757008 3.054624
G49T1 3.241081 3.255573 2.517952 3.003823 2.855017 3.667032 2.880364 2.735704 2.591819
G51T1 3.951708 2.613909 3.314288 3.313737 3.355384 2.723461 3.082336 3.486527 2.996587
G45T1 G47T1 G49T1 G51T1
E04T1 0.000000 0.000000 0.000000 0
E07T1 0.000000 0.000000 0.000000 0
E15T1 0.000000 0.000000 0.000000 0
E16T1 0.000000 0.000000 0.000000 0
E17T1 0.000000 0.000000 0.000000 0
E18T1 0.000000 0.000000 0.000000 0
E26T1 0.000000 0.000000 0.000000 0
E32T1 0.000000 0.000000 0.000000 0
E33T1 0.000000 0.000000 0.000000 0
E40T1 0.000000 0.000000 0.000000 0
E41T1 0.000000 0.000000 0.000000 0
G01T1 0.000000 0.000000 0.000000 0
G07T1 0.000000 0.000000 0.000000 0
G08T1 0.000000 0.000000 0.000000 0
G11T1 0.000000 0.000000 0.000000 0
G12T1 0.000000 0.000000 0.000000 0
G23T1 0.000000 0.000000 0.000000 0
G28T1 0.000000 0.000000 0.000000 0
G29T1 0.000000 0.000000 0.000000 0
G33T1 0.000000 0.000000 0.000000 0
G34T1 0.000000 0.000000 0.000000 0
G35T1 0.000000 0.000000 0.000000 0
G39T1 0.000000 0.000000 0.000000 0
G41T1 0.000000 0.000000 0.000000 0
G42T1 0.000000 0.000000 0.000000 0
G43T1 0.000000 0.000000 0.000000 0
G44T1 0.000000 0.000000 0.000000 0
G45T1 0.000000 0.000000 0.000000 0
G47T1 2.622049 0.000000 0.000000 0
G49T1 2.483421 2.703375 0.000000 0
G51T1 2.888370 3.737342 3.228014 0
$cmdscale.out
[,1] [,2]
E04T1 -0.307478914 0.1052193
E07T1 -0.643766151 0.7491549
E15T1 -0.975199488 0.0400526
E16T1 0.038866577 0.5464360
E17T1 1.503205734 -0.6364278
E18T1 0.434573653 1.5954392
E26T1 -0.143731616 -1.2155280
E32T1 1.207836065 -1.0799384
E33T1 -0.562666685 0.1996756
E40T1 -0.529989781 0.9229487
E41T1 0.253750366 -0.1656593
G01T1 0.164709608 0.1806431
G07T1 0.274298574 -0.1576792
G08T1 -0.008299294 0.4258536
G11T1 0.288698857 0.3963889
G12T1 -0.029242228 0.8389798
G23T1 -1.385556819 -0.4885929
G28T1 1.147824279 0.4550019
G29T1 -1.826474406 -1.8792195
G33T1 1.240094205 -0.0483144
G34T1 -0.763012735 -0.1034501
G35T1 0.169572833 0.3335401
G39T1 -0.803831086 -1.0716241
G41T1 1.652481001 -1.0275204
G42T1 0.222581751 -0.1352871
G43T1 -0.502487915 0.9879914
G44T1 0.169216876 0.8507657
G45T1 0.163626892 0.1730145
G47T1 -0.877032598 -0.1869413
G49T1 -0.898324290 0.1718193
G51T1 1.325756736 -0.7767420
$top
[1] 500
$gene.selection
[1] "pairwise"
$x
E04T1 E07T1 E15T1 E16T1 E17T1 E18T1 E26T1
-0.307478914 -0.643766151 -0.975199488 0.038866577 1.503205734 0.434573653 -0.143731616
E32T1 E33T1 E40T1 E41T1 G01T1 G07T1 G08T1
1.207836065 -0.562666685 -0.529989781 0.253750366 0.164709608 0.274298574 -0.008299294
G11T1 G12T1 G23T1 G28T1 G29T1 G33T1 G34T1
0.288698857 -0.029242228 -1.385556819 1.147824279 -1.826474406 1.240094205 -0.763012735
G35T1 G39T1 G41T1 G42T1 G43T1 G44T1 G45T1
0.169572833 -0.803831086 1.652481001 0.222581751 -0.502487915 0.169216876 0.163626892
G47T1 G49T1 G51T1
-0.877032598 -0.898324290 1.325756736
$y
E04T1 E07T1 E15T1 E16T1 E17T1 E18T1 E26T1 E32T1
0.1052193 0.7491549 0.0400526 0.5464360 -0.6364278 1.5954392 -1.2155280 -1.0799384
E33T1 E40T1 E41T1 G01T1 G07T1 G08T1 G11T1 G12T1
0.1996756 0.9229487 -0.1656593 0.1806431 -0.1576792 0.4258536 0.3963889 0.8389798
G23T1 G28T1 G29T1 G33T1 G34T1 G35T1 G39T1 G41T1
-0.4885929 0.4550019 -1.8792195 -0.0483144 -0.1034501 0.3335401 -1.0716241 -1.0275204
G42T1 G43T1 G44T1 G45T1 G47T1 G49T1 G51T1
-0.1352871 0.9879914 0.8507657 0.1730145 -0.1869413 0.1718193 -0.7767420
$axislabel
[1] "Leading logFC dim"
mds2
An object of class MDS
$dim.plot
[1] 1 2
$distance.matrix
E04T2 E07T2 E15T2 E16T2 E17T2 E18T2 E26T2 E32T2 E33T2
E04T2 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
E07T2 4.273153 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
E15T2 4.110799 2.362324 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
E16T2 4.080379 2.495456 2.156984 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
E17T2 4.552645 3.821675 3.806777 3.658310 0.000000 0.000000 0.000000 0.000000 0.000000
E18T2 4.443203 2.482602 2.784073 2.796146 3.735785 0.000000 0.000000 0.000000 0.000000
E26T2 4.295604 3.357235 3.305688 2.922538 3.944569 3.045812 0.000000 0.000000 0.000000
E32T2 3.819653 3.408815 3.493635 3.214646 2.780623 3.310612 3.229973 0.000000 0.000000
E33T2 4.220195 2.849149 2.817065 2.609980 3.821292 2.876731 2.995707 3.389701 0.000000
E40T2 3.987317 2.869400 2.737592 2.572035 4.392014 3.304209 3.391464 3.531252 2.896672
E41T2 4.008605 2.873283 2.872156 2.840837 3.641531 3.259970 3.578981 3.173348 2.998415
G01T2 4.291963 2.387111 2.678004 2.708997 4.200671 3.056258 3.221666 3.651198 3.111319
G07T2 4.133216 2.928027 3.014998 3.030383 3.279022 3.073333 3.405859 2.828067 3.528194
G08T2 4.257740 3.481233 3.164025 2.796898 3.828213 3.846413 3.632559 3.345444 3.449158
G11T2 4.405635 2.585746 3.159811 2.871303 3.312964 2.365720 3.153822 2.959185 3.260183
G12T2 3.820603 2.447574 2.669316 2.478493 3.947086 2.685486 3.559495 3.408882 3.189041
G23T2 4.470100 2.708533 2.523563 2.527039 4.470871 3.021439 3.285052 3.977808 2.526455
G28T2 4.083402 2.866391 3.014318 2.888226 3.551549 2.564360 3.311810 2.830127 3.515341
G29T2 4.084442 3.202328 2.877670 2.858968 4.343789 3.360346 3.192777 3.670360 2.888239
G33T2 4.212443 3.156397 3.014113 3.175648 3.870994 2.923673 3.431462 3.192954 3.779990
G34T2 4.275891 2.972821 2.643059 2.486781 3.818389 2.608929 2.813684 3.297010 3.111091
G35T2 3.956901 3.066298 2.950453 2.862819 3.688715 3.203173 3.439121 2.835096 3.532397
G39T2 3.774288 3.188030 2.872276 2.840120 3.652125 2.720735 2.747541 2.953317 3.223013
G41T2 4.204072 3.321514 3.142785 3.081753 3.142639 3.053101 3.119560 2.962350 3.653549
G42T2 3.833883 2.935672 2.728111 2.682553 3.810665 2.925453 3.129251 2.981157 3.144700
G43T2 4.114659 2.391146 2.662706 2.529051 3.940246 2.384343 3.302229 3.188499 2.819941
G44T2 4.281071 3.140817 2.985768 3.100289 3.494828 2.755088 3.144333 2.814413 3.614988
G45T2 4.528831 2.918561 3.150933 2.715878 2.980913 2.609757 3.029352 2.772882 3.354437
G47T2 4.154980 2.806731 3.212120 2.891264 3.229990 2.533611 3.343523 2.813021 2.973882
G49T2 4.180443 3.063351 2.698839 2.696083 4.110480 3.172689 3.200585 3.378725 3.410118
G51T2 3.864339 2.981684 3.037449 2.877994 3.783711 2.825343 3.075637 2.934965 3.429624
E40T2 E41T2 G01T2 G07T2 G08T2 G11T2 G12T2 G23T2 G28T2
E04T2 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
E07T2 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
E15T2 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
E16T2 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
E17T2 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
E18T2 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
E26T2 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
E32T2 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
E33T2 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
E40T2 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
E41T2 2.589449 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
G01T2 2.834090 2.934849 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
G07T2 3.280376 2.903754 2.920813 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
G08T2 2.950357 2.982238 3.345318 3.074093 0.000000 0.000000 0.000000 0.000000 0.000000
G11T2 3.483363 3.398818 3.235234 2.645267 3.570113 0.000000 0.000000 0.000000 0.000000
G12T2 2.902218 3.054802 3.033397 3.087552 3.395069 2.596324 0.000000 0.000000 0.000000
G23T2 2.990328 3.050901 2.868018 3.520851 3.766992 3.430415 3.060423 0.000000 0.000000
G28T2 3.094601 3.302655 3.127990 2.747852 3.515030 2.359016 2.637796 3.488125 0.000000
G29T2 2.970718 2.687674 3.033934 3.261762 3.513341 3.638470 3.319421 2.441072 3.543432
G33T2 3.341141 3.484935 3.094312 2.773349 3.695582 2.712501 3.053299 3.749075 2.306750
G34T2 3.078936 3.116449 2.979401 2.790282 3.425970 2.634331 2.768404 2.826327 2.599702
G35T2 2.740568 2.733498 3.103621 2.509732 2.764143 3.147276 2.977602 3.621345 2.626806
G39T2 3.243650 2.968785 3.296538 2.739382 3.274428 2.935539 3.057534 3.133540 3.002850
G41T2 3.767053 3.553718 3.484009 2.981525 3.908851 2.873436 3.300920 3.835699 2.749545
G42T2 2.845401 3.211933 3.057654 2.890230 3.261488 2.996016 2.805739 3.322121 2.327121
G43T2 2.300236 2.827936 2.717542 3.037297 3.157330 2.636075 2.502526 2.932546 2.459552
G44T2 3.376551 3.482690 3.226354 2.908192 3.583031 2.665803 3.257850 3.638024 2.269742
G45T2 3.496061 3.477891 3.449659 2.807677 3.339261 2.171106 2.986461 3.559340 2.244981
G47T2 3.256723 2.820802 3.366020 2.762324 3.261932 2.527177 2.879619 3.333719 2.734571
G49T2 2.965209 3.169545 2.997997 3.030864 3.207229 3.238257 2.967521 3.219086 2.500274
G51T2 3.100413 3.276736 3.034938 2.900706 3.517057 2.706215 2.874850 3.464477 2.347870
G29T2 G33T2 G34T2 G35T2 G39T2 G41T2 G42T2 G43T2 G44T2
E04T2 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
E07T2 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
E15T2 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
E16T2 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
E17T2 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
E18T2 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
E26T2 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
E32T2 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
E33T2 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
E40T2 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
E41T2 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
G01T2 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
G07T2 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
G08T2 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
G11T2 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
G12T2 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
G23T2 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
G28T2 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
G29T2 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
G33T2 3.637536 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
G34T2 2.815672 2.648982 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
G35T2 3.252641 2.798719 2.974596 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
G39T2 2.613250 3.096432 2.505816 2.731347 0.000000 0.000000 0.000000 0.000000 0.000000
G41T2 3.815491 2.572270 2.807584 3.174427 2.909323 0.000000 0.000000 0.000000 0.000000
G42T2 3.410307 2.728056 2.778318 2.537634 2.881766 2.875106 0.000000 0.000000 0.000000
G43T2 3.235938 3.014467 2.808441 2.708409 2.975611 3.223419 2.401898 0.000000 0.000000
G44T2 3.634752 2.491965 2.738186 2.786755 2.955814 2.611481 2.434032 2.800025 0.000000
G45T2 3.701963 2.928911 2.717388 2.967453 2.957079 2.771070 2.704027 2.633398 2.391102
G47T2 3.201099 3.381305 3.061030 2.847950 2.591824 3.415768 3.114171 2.627800 3.256008
G49T2 3.118198 2.553099 2.547985 2.599871 2.899011 2.999605 2.452132 2.789664 2.599534
G51T2 3.382706 2.362813 2.620095 2.845606 2.898561 2.542587 2.646709 2.846373 2.524489
G45T2 G47T2 G49T2 G51T2
E04T2 0.000000 0.000000 0.000000 0
E07T2 0.000000 0.000000 0.000000 0
E15T2 0.000000 0.000000 0.000000 0
E16T2 0.000000 0.000000 0.000000 0
E17T2 0.000000 0.000000 0.000000 0
E18T2 0.000000 0.000000 0.000000 0
E26T2 0.000000 0.000000 0.000000 0
E32T2 0.000000 0.000000 0.000000 0
E33T2 0.000000 0.000000 0.000000 0
E40T2 0.000000 0.000000 0.000000 0
E41T2 0.000000 0.000000 0.000000 0
G01T2 0.000000 0.000000 0.000000 0
G07T2 0.000000 0.000000 0.000000 0
G08T2 0.000000 0.000000 0.000000 0
G11T2 0.000000 0.000000 0.000000 0
G12T2 0.000000 0.000000 0.000000 0
G23T2 0.000000 0.000000 0.000000 0
G28T2 0.000000 0.000000 0.000000 0
G29T2 0.000000 0.000000 0.000000 0
G33T2 0.000000 0.000000 0.000000 0
G34T2 0.000000 0.000000 0.000000 0
G35T2 0.000000 0.000000 0.000000 0
G39T2 0.000000 0.000000 0.000000 0
G41T2 0.000000 0.000000 0.000000 0
G42T2 0.000000 0.000000 0.000000 0
G43T2 0.000000 0.000000 0.000000 0
G44T2 0.000000 0.000000 0.000000 0
G45T2 0.000000 0.000000 0.000000 0
G47T2 2.685655 0.000000 0.000000 0
G49T2 3.084091 3.454129 0.000000 0
G51T2 2.805970 3.285654 2.235876 0
$cmdscale.out
[,1] [,2]
E04T2 -0.18302458 2.4731956
E07T2 -0.61225138 -0.6327919
E15T2 -0.82672871 -0.3370764
E16T2 -0.70734699 -0.1785265
E17T2 1.69739888 1.1370555
E18T2 0.13326160 -1.0063341
E26T2 -0.05547883 -0.1681898
E32T2 1.18522817 1.0878806
E33T2 -1.15721882 0.1040338
E40T2 -1.27200772 0.2304654
E41T2 -0.87461593 0.8581724
G01T2 -0.93894380 -0.4471428
G07T2 0.55579676 0.3210439
G08T2 -0.44261625 1.0879907
G11T2 0.87656300 -0.7009843
G12T2 -0.35103555 -0.2532601
G23T2 -1.74482700 -0.6276455
G28T2 0.90321305 -0.5542049
G29T2 -1.50095184 0.3459996
G33T2 0.91756122 -0.6476270
G34T2 -0.04703599 -0.7025779
G35T2 0.22473342 0.5451274
G39T2 0.01760296 0.4040323
G41T2 1.33810242 -0.2408109
G42T2 0.25503078 -0.1356713
G43T2 -0.39168812 -0.4621395
G44T2 1.10716597 -0.5534135
G45T2 1.14591475 -0.5241496
G47T2 0.18464888 0.3842889
G49T2 -0.04466123 -0.4653136
G51T2 0.60821089 -0.3414269
$top
[1] 500
$gene.selection
[1] "pairwise"
$x
E04T2 E07T2 E15T2 E16T2 E17T2 E18T2 E26T2
-0.18302458 -0.61225138 -0.82672871 -0.70734699 1.69739888 0.13326160 -0.05547883
E32T2 E33T2 E40T2 E41T2 G01T2 G07T2 G08T2
1.18522817 -1.15721882 -1.27200772 -0.87461593 -0.93894380 0.55579676 -0.44261625
G11T2 G12T2 G23T2 G28T2 G29T2 G33T2 G34T2
0.87656300 -0.35103555 -1.74482700 0.90321305 -1.50095184 0.91756122 -0.04703599
G35T2 G39T2 G41T2 G42T2 G43T2 G44T2 G45T2
0.22473342 0.01760296 1.33810242 0.25503078 -0.39168812 1.10716597 1.14591475
G47T2 G49T2 G51T2
0.18464888 -0.04466123 0.60821089
$y
E04T2 E07T2 E15T2 E16T2 E17T2 E18T2 E26T2 E32T2
2.4731956 -0.6327919 -0.3370764 -0.1785265 1.1370555 -1.0063341 -0.1681898 1.0878806
E33T2 E40T2 E41T2 G01T2 G07T2 G08T2 G11T2 G12T2
0.1040338 0.2304654 0.8581724 -0.4471428 0.3210439 1.0879907 -0.7009843 -0.2532601
G23T2 G28T2 G29T2 G33T2 G34T2 G35T2 G39T2 G41T2
-0.6276455 -0.5542049 0.3459996 -0.6476270 -0.7025779 0.5451274 0.4040323 -0.2408109
G42T2 G43T2 G44T2 G45T2 G47T2 G49T2 G51T2
-0.1356713 -0.4621395 -0.5534135 -0.5241496 0.3842889 -0.4653136 -0.3414269
$axislabel
[1] "Leading logFC dim"
The multidimensional scaling plots above show that the R and NR groups do not seem to differ significantly in the first two leading dimensions at either time point. However, this does not necessarily mean that the expression of the two groups do not differ at all as we do not know the proportion of variance explained by each of the leading dimensions. If the first two dimensions only explain a very small amount of the overall variance in the datasets, it would make sense to not see the two experimental groups separating from each other in the plots.
An interesting feature of the plot at time point 2 is the large gap between sample E04T2 and the other samples. E04T2 is classified as NR, but it would be interesting to see if this patient has a unique disease or condition that resulted in this large difference in expression. Dispersion:
model_design <- model.matrix(~expGroups1$response)
disp1 <- estimateDisp(dnorm1, model_design)
bcv1 <- plotBCV(disp1, col.tagwise="black")
model_design <- model.matrix(~expGroups2$response)
disp2 <- estimateDisp(dnorm2, model_design)
bcv2 <- plotBCV(disp2, col.tagwise="black")
bcv1
NULL
bcv2
NULL
The BCV plots above show that the genes above the red line can be considered as differentially expressed. At both time points, it seems that there is a similar number of differentially expressed genes. This suggests that there may be some inherent expression differences between R and NR groups, as after receiving treatment the same number of genes are differentially expressed. However, since the BCV plots do not tell us which genes are differentially expressed, this hypothesis will have to be tested in downstream analyses.
Mean-variance relationship:
plotMeanVar(disp1, show.raw.vars = TRUE,
Warning messages:
1: In readChar(file, size, TRUE) : truncating string with embedded nuls
2: In readChar(file, size, TRUE) : truncating string with embedded nuls
3: In readChar(file, size, TRUE) : truncating string with embedded nuls
4: In readChar(file, size, TRUE) : truncating string with embedded nuls
5: In readChar(file, size, TRUE) : truncating string with embedded nuls
6: In readChar(file, size, TRUE) : truncating string with embedded nuls
7: In readChar(file, size, TRUE) : truncating string with embedded nuls
8: In readChar(file, size, TRUE) : truncating string with embedded nuls
9: In readChar(file, size, TRUE) : truncating string with embedded nuls
show.tagwise.vars=TRUE,
NBline=TRUE, show.ave.raw.vars = TRUE,
show.binned.common.disp.vars = FALSE)
plotMeanVar(disp1, show.raw.vars = TRUE,
show.tagwise.vars=TRUE,
NBline=TRUE, show.ave.raw.vars = TRUE,
show.binned.common.disp.vars = FALSE)
The mean-variance plots at both time points show that both R and NR groups exhibit similar amounts of variation in their gene expression. This is a useful addition to the BCV plots as we now know that not only does variation in expression not change much between two time points, but also the two experimental groups show similar amounts of variation at both time points. This is seen in the mean-variance plots as the grey and blue points overlap and follow the same general trend.
The data was cleaned in order to produce two sub-datasets with HGNC symbols as rownames:
head(finalDay1Data)
head(finalDay2Data)
Unfortunately, 3 of the rownames are not unique: BAZ2B’, ‘GTF2H2C’, ‘OPN3’. The authors of the study also ran into the same problem, and chose to discard these genes. I discarded these genes as well in my final dataset.
The data was also normalized using TMM normalization. I chose this method because normalization based on library size can result in differential expression results being skewed for one experimental condition (13), which would be undesirable when comparing expression in Responders and Non-Responders. Plotting the normalized data:
normdf1 <- as.data.frame(stack(log2(cpm(dnorm1$counts))))
ggplot(normdf1, aes(x=col, y=value))+
geom_boxplot()+
xlab("Sample")+
ylab("Log2 CPM")+
ggtitle("RNAseq for Patients at Time Point 1")+
theme(axis.text.x = element_text(angle = 90))
ggplot(data=normdf1, aes(x=value, group=col, col=col)) +
geom_density(adjust=1.5, alpha=.4)+
ylab("Smoothing density of log2 CPM")+
ggtitle("Smoothing density of log2 CPM from Normalized RNAseq of patients at time point 1")
normdf2 <- as.data.frame(stack(log2(cpm(dnorm2$counts))))
ggplot(normdf2, aes(x=col, y=value))+
geom_boxplot()+
xlab("Sample")+
ylab("Log2 CPM")+
ggtitle("RNAseq for Patients at Time Point 2")+
theme(axis.text.x = element_text(angle = 90))
ggplot(data=normdf2, aes(x=value, group=col, col=col)) +
geom_density(adjust=1.5, alpha=.4)+
ylab("Smoothing density of log2 CPM")+
ggtitle("Smoothing density of log2 CPM from Normalized RNAseq of patients at time point 2")
The final coverage of this dataset after filtering is 12,696 genes, which is around half of the total amount of genes in the human genome. However, around 5% of these genes did not map to any HGNC symbols, which effectively leaves us with closer to 12,000 genes that can be used in downstream analyses. This is on the lower end of the requisite coverage, but should still be sufficient for enrichment analysis.
Another issue that occurred when mapping to HGNC symbols is that some expression values were mapped to the same symbol. This was addressed by first filtering out lowly expressed genes and then mapping, as lowly expressed genes may be due to technical errors or other issues. However, even after filtering, there were some expression values that mapped to the same symbol. In the end, there were 3 duplicated HGNC symbols. The authors of the study mapped their genes using the GRCh38 human reference genome, and also had multiple expression values mapping to the same symbol. The authors chose to discard these genes. In my mapping, I used the R package org.Hs.eg.db, which is based on the 2020 August UCSC Genome Bioinformatics human genome assembly. After checking the release date of hg38, it seems that it should be the same build as the one used by the authors in the study. Thus, for downstream analyses, I have discarded the genes that were mapped to the same symbols so that enrichment analyses can be run.
In the filtering step, 45400 low count genes were filtered out. The original dataset contained 58096 genes, whereas the filtered dataset contained 12696 genes. This may be considered as discarding outliers, but since these low count genes are likely due to technical conditions, I think it is acceptable to remove them. Other than these, no outliers were removed as it is unclear whether or not they represent biological variation. However, for the plots above, genes with infinite counts after the log 2 CPM transformation were discarded. I think this is a reasonable measure as mathematically it’s not possible to handle these genes, so going forward I will be removing them as well.This will result in 884 genes being removed from the time point 1 dataset, and 692 being removed from the time point 2 dataset.
In this assignment, the selected dataset was cleaned, filtered, and normalized. As well, exploratory data analysis was performed in order to determine possible trends in the data. Based on these initial results, we cannot say conclusively whether there are significant differences in expression between the R and NR groups at either time point. However, it will be exciting to see what results from enrichment analyses, as finding differences in expression between Responders and Non-Responders to hemotherapy treatment can lead to the development of other treatments for sepsis.